Skip to content

feat(kernel): port resilience primitives from daemon#37

Merged
stackbilt-admin merged 2 commits intomainfrom
feat/port-resilience
Apr 23, 2026
Merged

feat(kernel): port resilience primitives from daemon#37
stackbilt-admin merged 2 commits intomainfrom
feat/port-resilience

Conversation

@stackbilt-admin
Copy link
Copy Markdown
Member

Summary

Lifts `kernel/resilience.ts` from the daemon into core. Landing page (`web/src/landing.ts:938`, `:1028`) promised circuit-breaker + cost-tracking to every consumer; this wires the implementation behind the promise. Closes #33.

What's in core now

  • `CircuitRegistry`, `AegisCostTracker` classes (wrap llm-providers' managers)
  • `withRetry` (exponential backoff + jitter, retryable error detection)
  • `withFallback` (primary → fallback with optional predicate)
  • `resilient(service, op, { circuits, fallback?, retry?, circuit? })` combinator
  • `markProviderExhausted` / `isProviderExhausted` (delegates to llm-providers' `defaultExhaustionRegistry` — one global exhaustion state per process is correct)
  • `createResilience({ budgets, onEmergencyBudget? })` factory that returns `{ circuits, costs, ledger }` with the ledger event listener wired inside

What does NOT live in core

Monthly budgets. Those are operator-specific — daemon's `$20` Anthropic / `$5` Groq / `$10` Cerebras ≠ anyone else's. Core exposes the factory; consumers supply their numbers. This also keeps core's module load side-effect-free: the ledger event listener only fires for consumers that call the factory.

Scope note (partial #24)

This pulls `@stackbilt/llm-providers` in as a core dep (`file:` link) — which is a first slice of #24 (Phase D adoption). Not touching `cognition.ts`, `groq.ts`, or `claude.ts` call sites in this PR; those remain for #24 proper.

Design

  • Factory, not singletons. Core has no module-scope `circuits`/`costs`/`ledger` — consumers instantiate their own.
  • `resilient` takes `circuits` in `opts` (no module-scope registry to close over). The daemon exposes a thin wrapper that passes its singleton, so daemon call sites don't change.
  • `CircuitBreakerConfig` keeps AEGIS-native naming (`resetTimeoutMs` vs llm-providers' `resetTimeout`) via internal adapter — unchanged from the daemon's API surface.

Test plan

  • Core typecheck: clean
  • Core tests: 1474 passed, 1 skipped
  • Daemon typecheck against this branch: clean
  • Daemon tests against this branch: 1692 passed, 7 skipped
  • Charter CI passes on this PR
  • Daemon PR (consuming this) opens + passes once this merges

Refs #24, #33, #35

Lifts resilience.ts from the daemon into core. Landing page (web/src/
landing.ts:938, :1028) promised circuit-breaker + cost-tracking to every
consumer; this wires the implementation behind the promise.

- `CircuitRegistry`, `AegisCostTracker` classes (wrap llm-providers' managers)
- `withRetry` (exponential backoff + jitter, retryable error detection)
- `withFallback` (primary → fallback with optional predicate)
- `resilient(service, op, { circuits, fallback?, retry?, circuit? })` combinator
- `markProviderExhausted` / `isProviderExhausted` (delegates to llm-providers'
  defaultExhaustionRegistry — one global exhaustion state per process)
- `createResilience({ budgets, onEmergencyBudget? })` factory that returns
  `{ circuits, costs, ledger }` with the ledger event listener wired inside

Monthly budgets. Those are operator-specific (daemon's $20 Anthropic / $5
Groq / $10 Cerebras ≠ anyone else's). Core exposes the factory; consumers
supply their numbers. This also keeps core's module load side-effect-free:
the ledger event listener only fires for consumers that call the factory.

This pulls @stackbilt/llm-providers in as a core dep (file: link) — which
is a first slice of #24 (Phase D adoption). Not touching cognition.ts,
groq.ts, or claude.ts call sites in this PR; those remain for #24 proper.

- core typecheck: clean
- core tests: 1474 passed, 1 skipped
- daemon typecheck against this branch: clean (PR filed separately)
- daemon tests against this branch: 1692 passed, 7 skipped

Refs: #24, #33, #35
Core now depends on @stackbilt/llm-providers via file: link (#33). Update
CI to checkout llm-providers alongside aegis-oss and build it before
running npm ci in web/, matching the pattern daemon's CI uses.

- Nest aegis-oss checkout under aegis-oss/ so sibling repos resolve the
  file:../../llm-providers path.
- Build llm-providers (publishes from dist/) before core install so the
  file: symlink resolves types.
@stackbilt-admin stackbilt-admin merged commit 77c35c6 into main Apr 23, 2026
2 checks passed
@stackbilt-admin stackbilt-admin deleted the feat/port-resilience branch April 23, 2026 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(kernel): port resilience (circuit breakers + retry/backoff) from daemon

1 participant